Topological feature extraction of lung tumor CT scans

In this notebook, we will illustrate how to extract topological features from tumors in the lungs. We start by setting the working directory and importing the required libraries.

Loading an example scan

We load an example scan/tumor from which we will obtain topological features.

Persistent homology of the image pixels (segmentation only)

There are various ways to compute persistent homology from segmented tumors in CT scans. The first we will show is when the filtration, i.e., the sequence of simplicial complexes ordered by inclusion (from which we subsequently compute persistent homology) is obtained from the direct values of the segmented image pixels in the CT scan.

We now use the intensity of the tumor pixels in the segmentation to construct the filtration. This is visuallized as follows.

Persistent homology now computes the birth-time and death-times of 'holes' accross such filtrations. These holes can be distinguished by their dimension: components (H0), cycles (H1), and voids (H2). For our example filtration above, it can be computed from the image pixels as follows.

Persistence diagrams can be used to visualize the results of our persistent homology computation. For each hole that was born at time b and that died at time d, it marks the point (b, d) above the first diagonal in the Euclidean plane. Note that d is possibly infinite, which corresponds to a hole that never dies. E.g., when the image includes at least one pixel, it will also include at least one connected component that never dies. Since holes are distinguished by their dimension, so are the persistence diagrams. We can visualize them as follows.

Persistent homology of the image pixels (with background)

We can also quantify the topological information of our tumor when including surrounding tissue information. The procedure for this is exactly the same as above, only we include all pixels of a boundary box of the segmentation in the filtration.

Persistent homology of the tumor surface point cloud

Finally, we can also quantify the topological information of the tumor, through the vertices of its surface mesh. While the diagrams above mainly quantify textural properties of the tumor, more global shape properties are quantified through the points on its surface.

The point clouds generally too large for computing 2-dimensional persistent homology efficiently. We can overcome this issue by using a theoretically justified approximation algorithm for computing our diagrams using a set number of landmarks. This procedure is implemented in the Ripser library: https://pypi.org/project/ripser/.

Topological feature vectorization

Many methods to learn from persistence diagrams have been developed, and are currently being researched. For further exploration, we highly recommend taking a look at the GUDHI library for many procedures compatible with sklearn: http://gudhi.gforge.inria.fr/python/latest/representations.html.

We currently consider a very simple fixed size vectorization of each persistence diagram (one per type of filtration and dimension of hole), which can be conducted as follows.